AITopics

Country: North America > United States > Illinois (0.28)

Industry: Energy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Neural Information Processing SystemsApr-24-2026, 11:10:02 GMT

Double Gumbel Q-Learning

We show that Deep Neural Networks introduce two heteroscedastic Gumbel noise sources into Q-Learning. To account for these noise sources, we propose Double Gumbel Q-Learning, a Deep Q-Learning algorithm applicable for both discrete and continuous control. In discrete control, we derive a closed-form expression for the loss function of our algorithm. In continuous control, this loss function is intractable and we therefore derive an approximation with a hyperparameter whose value regulates pessimism in Q-Learning. We present a default value for our pessimism hyperparameter that enables DoubleGum to outperform DDPG, TD3, SAC, XQL, quantile regression, and Mixture-of-Gaussian Critics in aggregate over 33 tasks from DeepMind Control, MuJoCo, MetaWorld, and Box2D and show that tuning this hyperparameter may further improve sample efficiency.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Country:

North America > Canada (0.28)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)

Genre: Research Report > New Finding (0.45)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Neural Information Processing SystemsFeb-7-2026, 23:23:29 GMT

CONSOLE: ConvexNeuralSymbolicLearning

Learning the underlying equation from data is a fundamental problem in many disciplines.

artificial intelligence, equation, machine learning, (16 more...)

Country:

North America > United States > Illinois > Champaign County > Champaign (0.04)
North America > United States > Arizona > Maricopa County > Tempe (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Neural Information Processing SystemsFeb-7-2026, 13:13:37 GMT

07956d40074d6523bad11112b3225c6e-Paper-Conference.pdf

algorithm, continuous control, doublegum, (15 more...)

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.27)
North America > Canada > Quebec > Montreal (0.04)
Asia > Middle East > Jordan (0.04)
(3 more...)

Genre: Research Report (0.92)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)
(2 more...)

Neural Information Processing SystemsDec-23-2025, 20:28:34 GMT

Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation

While agents trained by Reinforcement Learning (RL) can solve increasingly challenging tasks directly from visual observations, generalizing learned skills to novel environments remains very challenging. Extensive use of data augmentation is a promising technique for improving generalization in RL, but it is often found to decrease sample efficiency and can even lead to divergence. In this paper, we investigate causes of instability when using data augmentation in common off-policy RL algorithms. We identify two problems, both rooted in high-variance Q-targets. Based on our findings, we propose a simple yet effective technique for stabilizing this class of algorithms under augmentation. We perform extensive empirical evaluation of image-based RL using both ConvNets and Vision Transformers (ViT) on a family of benchmarks based on DeepMind Control Suite, as well as in robotic manipulation tasks. Our method greatly improves stability and sample efficiency of ConvNets under augmentation, and achieves generalization results competitive with state-of-the-art methods for image-based RL in environments with unseen visuals. We further show that our method scales to RL with ViT-based architectures, and that data augmentation may be especially important in this setting.

augmentation, convnet and vision transformer, deep q-learning, (8 more...)

Genre: Research Report > Promising Solution (0.60)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.80)

Neural Information Processing SystemsOct-9-2024, 17:21:34 GMT

Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation

augmentation, convnet and vision transformer, data augmentation, (4 more...)

Genre: Research Report > Promising Solution (0.43)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Jedlička, Adam, Guy, Tatiana Valentine

Exploration in Knowledge Transfer Utilizing Reinforcement Learning

arXiv.org Artificial IntelligenceJul-15-2024

The contribution focuses on the problem of exploration within the task of knowledge transfer. Knowledge transfer refers to the useful application of the knowledge gained while learning the source task in the target task. The intended benefit of knowledge transfer is to speed up the learning process of the target task. The article aims to compare several exploration methods used within a deep transfer learning algorithm, particularly Deep Target Transfer $Q$-learning. The methods used are $\epsilon$-greedy, Boltzmann, and upper confidence bound exploration. The aforementioned transfer learning algorithms and exploration methods were tested on the virtual drone problem. The results have shown that the upper confidence bound algorithm performs the best out of these options. Its sustainability to other applications is to be checked.

knowledge, q-learning, source task, (15 more...)

2407.10835

Genre: Research Report (0.64)

Industry: Leisure & Entertainment (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.97)

Núñez-Molina, Carlos, Fernández-Olivares, Juan, Pérez, Raúl

Learning to Select Goals in Automated Planning with Deep-Q Learning

arXiv.org Artificial IntelligenceJun-20-2024

In this work we propose a planning and acting architecture endowed with a module which learns to select subgoals with Deep Q-Learning. This allows us to decrease the load of a planner when faced with scenarios with real-time restrictions. We have trained this architecture on a video game environment used as a standard test-bed for intelligent systems applications, testing it on different levels of the same game to evaluate its generalization abilities. We have measured the performance of our approach as more training data is made available, as well as compared it with both a state-of-the-art, classical planner and the standard Deep Q-Learning algorithm. The results obtained show our model performs better than the alternative methods considered, when both plan quality (plan length) and time requirements are taken into account. On the one hand, it is more sample-efficient than standard Deep Q-Learning, and it is able to generalize better across levels. On the other hand, it reduces problem-solving time when compared with a state-of-the-art automated planner, at the expense of obtaining plans with only 9% more actions.

architecture, dqp model, subgoal, (13 more...)

doi: 10.1016/j.eswa.2022.117265

2406.14779

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Nevada (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(10 more...)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Murray, Lucas, Castillo, Tatiana, Carrasco, Jaime, Weintraub, Andrés, Weber, Richard, de Diego, Isaac Martín, González, José Ramón, García-Gonzalo, Jordi

Advancing Forest Fire Prevention: Deep Reinforcement Learning for Effective Firebreak Placement

arXiv.org Artificial IntelligenceApr-12-2024

Over the past decades, the increase in both frequency and intensity of large-scale wildfires due to climate change has emerged as a significant natural threat. The pressing need to design resilient landscapes capable of withstanding such disasters has become paramount, requiring the development of advanced decision-support tools. Existing methodologies, including Mixed Integer Programming, Stochastic Optimization, and Network Theory, have proven effective but are hindered by computational demands, limiting their applicability. In response to this challenge, we propose using artificial intelligence techniques, specifically Deep Reinforcement Learning, to address the complex problem of firebreak placement in the landscape. We employ value-function based approaches like Deep Q-Learning, Double Deep Q-Learning, and Dueling Double Deep Q-Learning. Utilizing the Cell2Fire fire spread simulator combined with Convolutional Neural Networks, we have successfully implemented a computational agent capable of learning firebreak locations within a forest environment, achieving good results. Furthermore, we incorporate a pre-training loop, initially teaching our agent to mimic a heuristic-based algorithm and observe that it consistently exceeds the performance of these solutions. Our findings underscore the immense potential of Deep Reinforcement Learning for operational research challenges, especially in fire prevention. Our approach demonstrates convergence with highly favorable results in problem instances as large as 40 x 40 cells, marking a significant milestone in applying Reinforcement Learning to this critical issue. To the best of our knowledge, this study represents a pioneering effort in using Reinforcement Learning to address the aforementioned problem, offering promising perspectives in fire prevention and landscape management

algorithm, deep reinforcement learning, learning, (12 more...)

2404.08523

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
North America > United States > California (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Coelho, Rodrigo, Sequeira, André, Santos, Luís Paulo

VQC-Based Reinforcement Learning with Data Re-uploading: Performance and Trainability

arXiv.org Artificial IntelligenceJan-21-2024

Reinforcement Learning (RL) consists of designing agents that make intelligent decisions without human supervision. When used alongside function approximators such as Neural Networks (NNs), RL is capable of solving extremely complex problems. Deep Q-Learning, a RL algorithm that uses Deep NNs, achieved super-human performance in some specific tasks. Nonetheless, it is also possible to use Variational Quantum Circuits (VQCs) as function approximators in RL algorithms. This work empirically studies the performance and trainability of such VQC-based Deep Q-Learning models in classic control benchmark environments. More specifically, we research how data re-uploading affects both these metrics. We show that the magnitude and the variance of the gradients of these models remain substantial throughout training due to the moving targets of Deep Q-Learning. Moreover, we empirically show that increasing the number of qubits does not lead to an exponential vanishing behavior of the magnitude and variance of the gradients for a PQC approximating a 2-design, unlike what was expected due to the Barren Plateau Phenomenon. This hints at the possibility of VQCs being specially adequate for being used as function approximators in such a context.

algorithm, data re-uploading, gradient, (14 more...)

2401.11555

Country: Europe > Portugal > Braga > Braga (0.04)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)